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1 Applications II: Repeating pattern discoveiy and structure analysis from acoustic 
music data 

Lie Lu, Muyuan Wang, Hong-Jiang Zhang 

October 2004 Proceedings of the 6th ACM SIGMM international workshop on 
Multimedia information retrieval 

Publisher: ACM Press 

Full text available: pdf(380.44 KB) Additional Information: full citation, abstract , references . Index terms 

Music and songs usually have repeating patterns and prominent structure. The automatic 
extraction of such repeating patterns and structure is useful for further music 
summarization, indexing and retrieval. In this paper, an effective approach of repeating 
pattern discovery and structure analysis of acoustic music data is proposed. In order to 
represent the melody similarity more accurately, in our approach. Constant Q transform is 
utilized in feature extraction and a novel similarity measure ... 

Keywords: CQT, music structure, repeating pattern, structure-based distance measure 



Analysis and modeling of FO contours for cantonese text-to-speech 

Yujia Li, Tan Lee, Yao Qian 

September 2004 ACM Transactions on Asian Language Information Processing 

(TALZP), Volume 3 Issue 3 
Publisher: ACM Press 

Full text available: ^pdf(969.61 KB) Additional Information: fuil citation , abstract , references , index terms 

For the generation of highly natural synthetic speech, the control of prosody is of primary 
importance. The fundamental frequency (FO) is one of the most important components of 
speech prosody. This research investigates the variation of FO in continuous Cantonese 
speech, with the goal of establishing an effective mechanism of prosody control in 
Cantonese text-to-speech (TTS) applications. Cantonese is a commonly used Chinese 
dialect that is well known for being rich in tones. This article de ... 

Keywords: Chinese dialects. Text-to-speech, fundamental frequency, prosody, tones 



Real-time acoustic modeling for distributed virtual environments 
Thomas Funkhouser, Patrick MIn, Ingrld Carlbom 
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July 1999 Proceedings of the 26th annual conference on Computer graphics and 

interactive techniques 
Publisher: ACM Press/Addison-Wesiey Publishing Co. 

Full text available: ^ pdf(262.94 KB) Additional Information: full citation , references , citings, index terms 



Keywords: acoustic modeling, auralization, beam tracing, virtual environment systems, 
virtual reality 



^ Voice response systems 
D L. Lee, F H. Lochovsky 

December 1983 ACM Computing Surveys (CSUR), Volume 15 issue 4 
Publisher: ACM Press 

Full text available: ^ pdf(2.22 MB) Additional Infomriation: full citation , references , index terms 




5 Voice fonts for individuality representation and transformation Q 
Ashish Verma, Arun Kumar 

February 2005 ACM Transactions on Speech and Language Processing (TSLP), volume 2 
Issue 1 

Publisher: ACM Press 

Full text available: ^pdf(201.04 KB) Additional Infonnation: full citation , abstract, references , index terms 

Speaker individuality transformation is used to modify the speech signal's characteristics 
so that it sounds as if it Is spoken by another speaker. Previous methods for individuality 
transformation use mapping functions which depend upon a pair of speakers. We introduce 
the paradigm of voice fonts to represent the individuality of a speaker, independent of 
other speakers. Several objective and subjective tests are conducted to evaluate the 
performance of the approaches proposed for the voice fon ... 

Keywords: Speech individuality, voice conversion, voice fonts 




s Acoustic modeling: Minimizing speaker variation effects for speaker-independent 
speech recognition 
Xuedong Huang 

February 1992 Proceedings of the workshop on Speech and Natural Language HLT '91 

Publisher: Association for Computational Linguistics 

Full text available: ^ Ddf(605.36 KB) Additional Information: full citation , abstract , references 

For speaker-independent speech recognition, speaker variation is one of the major error 
sources. In this paper, a speaker-independent normalization network Is constructed such 
that speaker variation effects can be minimized. To achieve this goal, multiple speaker 
clusters are constructed from the speaker-independent training database. A codeword- 
dependent neural network is associated with each speaker cluster. The cluster that 
contains the largest number of speakers is designated as the golden c ... 

^ Acoustic modeling: Subphonetic modeling for speech recognition 
Mei-Yuh Hwang, Xuedong Huang 

February 1992 Proceedings of the workshop on Speech and Natural Language HLT '91 
Publisher: Association for Computational Linguistics 

Full text available: ■ Qpdf(611.33 KB) Additional Information: full citation , abstract , references 
How to capture Important acoustic clues and estimate essential parameters reliably Is one 



http://portal.acm.org/residtsxfo?coll=ACM&dl=ACM&CFID=739735&CFTOKEN=22 7/7/06 



Results (page 1): waveguide?<sentence>design and acoustic and section? and concatenat* Page 3 of 6 



of the central Issues In speech recognition, since we will never have sufficient training data 
to model various acoustic-phonetic phenomena. Successful examples include subword 
models with many smoothing techniques. In comparison with subword models, 
subphonetic modeling may provide a finer level of details. We propose to model 
subphonetic events with l^arkov states and treat the state in phonetic hidden Mar ... 

8 Spoken dialogue technology: enabling the conversational user interface 
Michael F. McTear 

March 2002 ACM Computing Surveys (CSUR), Volume 34 issue l 
Publisher: ACM Press 

Full text available: ^Ddf(987.69 KB) Additional Information: full citation , abstra^. references , dtiogs. index 
•^^'■^ terms, review 

Spoken dialogue systems allow users to interact witli computer-based applications such as 
databases and expert systems by using natural spoken language. Tlie origins of spoken 
dialogue systems can be traced back to Artificial Intelligence research in the 1950s 
concerned with developing conversational Interfaces. However, it is only within the last 
decade or so, with major advances in speech technology, that large-scale working systems 
have been developed and, in some cases, introduced into commerc ... 

Keywords: Dialogue management, human computer interaction, language generation, 
language understanding, speech recognition, speech synthesis 



9 The Hearsav-ll Speech-Understanding System: Integrating Knowledge to Resolve 
Uncertainty 

Lee D. Erman, Frederick Hayes-Roth, Victor R. Lesser, D. Raj Reddy 
June 1980 ACM Computing Surveys (CSUR), volume 12 issue 2 
Publisher: ACM Press 

Full text available: ^pdf(3.83MB) Additional Information: full citation, references , citings, index terms 




10 Continuous speech recognition I: Acoustic modeling of subword units for large 
vocabulary speaker independent speech recognition 
Cfiin-Hui Lee, Lawrence R. Rabiner, Roberto Pieraccini, Jay G. Wilpon 

October 1989 Proceedings of the workshop on Speech and Natural Language HLT '89 
Publisher: Association for Computational Linguistics 

Full text available: ^pdf(904.19 KB) Additional Infonnation: full citation, abstract, references 

The field of large vocabulary, continuous speech recognition has advanced to the point 
where there are several systems capable of attaining between 90 and 95% word accuracy 
for speal<er independent recognition of a 1000 word vocabulary, spoken fluently for a task 
with a perplexity (average word branching factor) of about 60. There are several factors 
which account for the high performance achieved by these systems, including the use of 
hidden Markov models (HMM) for acoustic modeling, the use of ... 



Video Processing: Multimedia edges: finding hierarchy in all dimensions 
Malcolm Slaney, Dulce Ponceleon, James Kaufman 

October 2001 Proceedings of the ninth ACM internationai conference on Muitimedia 
Publisher: ACM Press 

Full text available: ^Ddff6.41 MB^ Additional Information: fuH citation, abstract, references, citings, index 

terms 

This paper describes a new unified representation for the informaition in a video. We 
reduce the dimensionality of the signal with either a singular-value decomposition (on the 
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semantic and image data) or mel-frequency cepstral coefficients (on the audio data) and 
then concatenate the vectors to form a multi-dimensional represendtation of the video. 
Using scale-space techniques we find large jumps in the video's path, which we call edges. 
We use these techdniques to analyze the temporal properti ... 

Keywords: audio, automatic segmentation, color space, hierarchy, images, latent 
semantic indexing, multimedia, video, scale space, semantic space, singular-value 
decomposition, temporal properties 



12 Semiring parsing 
Joshua Goodman 

December 1999 Computational Linguistics, volume 25 issue 4 

Publisher: MIT Press 

Full text available: « i o fciiD\ ^ 

1gjpQT(Z.i3iVlB)'qa*' Additional Information: full citation , abstract , references , citings 
Publisher Site 

We synthesize work on parsing algorithms, deductive parsing, and the theory of algebra 
applied to formal languages Into a general system for describing parsers. Each parser 
performs abstract computations using the operations of a semiring. The system allows a 
single, simple representation to be used for describing parsers that compute recognition, 
derivation forests, Viterbi, n-best, inside values, and other values, simply by substituting 
the operations of different semirings. We also show how t ... 

'•^ Using tone information in Cantonese continuous speech recognition 
Tan Lee, Wai Lau, Y. W. Wong, P. C. Ching 

March 2002 ACM Transactions on Asian Language Information Processing (TALIP), 

Volume 1 Issue 1 
Publisher: ACM Press 

Full text available: ^ Pdff800.46 KB) Additional Information: full citation, abstract, references, index terms 

In Chinese languages, tones carry important infornnation at various linguistic levels. This 
research is based on the belief that tone information, if acquired accurately and utilized 
effectively, contributes to the automatic speech recognition of Chinese. In particular, we 
focus on the Cantonese dialect, which is spoken by tens of millions of people in Southern 
China and Hong Kong. Cantonese is well known for its complicated tone system, which 
makes automatic tone recognition very difficult. This ... 

Keywords: Chinese dialects, FO normalization, knowledge integration, speech recognition, 
tone recognition 



Poster session 1: A segment-based audio-visual speech recognizer: data collection, Q 
development, and initial experiments 
Timothy J. Hazen, Kate Saenko, Chia-Hao La, James R. Glass 
October 2004 Proceedings of the 6th international conference on Multimodal 

interfaces 
Publisher: ACM Press 

Full text available: ^pdf(276.15 KB) Additional Information: full citation, abstract , references , index terms 

This paper presents the development and evaluation of a speaker-independent audio- 
visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. 
To support this research, we have collected a new video corpus, called Audio-Visual TIMIT 
(AV-TIMIT), which consists of 4 total hours of read speech collected from 223 different 
speakers. This new corpus was used to evaluate our new AVSR system which incorporates 
a novel audio-visual integration scheme using segment-constral ... 
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Keywords: audio-visual corpora, audio-visuai speech recognition 



15 Video Rewrite: driving visual speech with audio 
Christoph Bregler, Michele Covell, l^alcolm Slaney 

August 1997 Proceedings of the 24th annual conference on Computer graphics and 

interactive techniques 
Publisher: ACM Press/Addison-Wesley Publishing Co. 

Full text available: ^ pdfd 79.44 KB) Additional Infonnation: full citation , references , citings , index terms 




Keywords: facial animation, lip sync 



Audio-visual speech recognition using red exclusion and neural networks 
Trent W. Lewis, David M. W. Powers 

January 2002 Australian Computer Science Communications , Proceedings of the 

twenty-fifth Australasian conference on Computer science - Volume 4 
CRPITS '02, Volume 24 Issue 1 
Publisher: Australian Computer Society, Inc. , IEEE Computer Society Press 
Full text available: ^ pdf(984.26 KB) Additional Information: full citation , abstract , references , index terms 

Automatic speech recognition (ASR) performs well under restricted conditions, but 
performance degrades in noisy environments. Audio-Visual Speech Recognition (AVSR) 
combats this by incorporating a visual signal into the recognition. This paper briefly 
reviews the contribution of psycholinguistics to this endeavour and the recent advances In 
machine AVSR. An important first step in AVSR is that of feature extraction from the 
mouth region and a technique developed by the authors is breifly present ... 

Keywords: audio-visual speech recogition, neural networks, sensor fusion 



17 ISIS: an adaptive, trilingual conversational system with interleaving interaction and 
delegation dialoos 

Helen Meng, P. C. Ching, Shuk Fong Chan, Yee Fong Wong, Cheong Chat Chan 
September 2004 ACM Transactions on Computer-Human Interaction (TOCHI), volume ii 
Issue 3 

Publisher: ACM Press 

Full text available: ^Pdf(3.71 MB) Additional Information: full citation, abstract , references , index terms 

ISIS (Intelligent Speech for Information Systems) is a trilingual spoken dialog system 
(SDS) for the stocks domain. It handles two dialects of Chinese (Cantonese and 
Putonghua) as well as English— the predominant languages in our region. The system 
supports spoken language queries regarding stock market information and simulated 
personal portfolios. The conversational interface is augmented with a screen display that 
can capture mouse-clicks as well as textual input by typing or stylus-writing. ... 

Keywords: Human-computer spoken language interface, interaction and delegation 
dialogs 




Document and passage retrieval based on hidden Markov models 
EIke Mittendorf, Peter Schauble 

August 1994 Proceedings of the 17th annual international ACM SIGIR conference on 

Research and development in information retrieval 
Publisher: Springer-Verlag New York. Inc. 
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Full text available: ^ pdf(827.36 KB) Additional Infomiatlon: full citation, references , dtinos . index temis 



Music: Automated extraction of music snippets 
Lie Lu, Hong-Jiang Zhang 

November 2003 Proceedings of the eleventh ACM international conference on 
Multimedia 

Publisher: ACM Press 

Full text available: fglDdf(316.12 KB) Additional Information: full citation , abstract, references , citings, Index 
'^'"^ terms 

Similar to Image and video thumbnail, music snippet is defined as the most representative 
or highlight excerpt of a music clip, and can be used efficiently for fast browsing large 
number of music files. Music snippet is usually a part of the repeated melody, main theme 
or chorus. In this paper, we present an approach to extracting music snippet 
automatically. In our approach, the most salient segment of the music is firstly detected 
based on its occurrence frequency and energy Information. Meanw ... 

Keywords: music saliency, music snippet, music structure, music thumbnail, musical 
phrase, tempo estimation 



20 Recursive hashing functions for n-grams | 
Jonathan D. Cohen 

July 1997 ACM Transactions on Information Systems (TOIS), Volume is issue 3 
Publisher: ACM Press 

Full text available: fiQpdf(361.86 KB) Additional Information: full citation , abstract, references , dtings, index 
^ terms , review 

Many indexing, retrieval, and comparison methods are based on counting or cataloguing n- 
grams in streams of symbols. The fastest method of implementing such operations is 
through the use of hash tables. Rapid hashing of consecutive n-grams is best done using a 
recursive hash function, in which the hash value of the current n-gram is drived from the 
hash value of Its predecessor. This article generalizes recursive hash functions found In 
the ... 

Keywords: n-grams, hashing, hashing functions, recursive hashing 
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Real-time acoustic modeling for distributed virtual environments Q 

Thomas Funkhouser, Patrick Min, Ingrid Carlbom 

July 1999 Proceedings of the 26th annual conference on Computer graphics and 
interactive techniques 

Publisher: ACM Press/Addison-Wesley Publishing Co. 

Full text available: ^ pdf(262.94 KB) Additional Information: full citation , references , citings, index terms 



Keywords: acoustic modeling, aurallzation, beam tracing, virtual environment systems, 
virtual reality 



A beam tracing approach to acoustic modeling for interactive virtual environments 
Thomas Funkhouser, Ingrid Carlbom, Gary Elko, Gopal Pingall, Mohan SondhI, Jim West 
July 1998 Proceedings of the 25th annual conference on Computer graphics and 

interactive techniques 
Publisher: ACM Press 

Full text available: ^pdff325.10 KB) Additional Infonnation: full citation , references, citings, index temis 



Keywords: acoustic modeling, auralization, beam tracing, spatlalized sound, virtual 
environment systems, virtual reality 



Dialogue act modeling for automatic tagging and recognition of conversational 

speech 

Andreas Stolcke, Noah Coccaro, Rebecca Bates, Paul Taylor, Carol Van Ess-Dykema, Klaus 
RIes, Elizabeth Shriberg, Daniel Jurafsky, Rachel Martin, Marie Meteer 
September 2000 Computational Linguistics, Volume 26 Issue 3 
Publisher: MIT Press 

Full text available: ^ • M 

ig pc3t(2.53MB)^ Additional Information: full citation , abstract, references , citings 
Publisher Site 

We describe a statistical approach for modeling dialogue acts in conversational speech, 
i.e., speech-act-like units such as STATEMENT, QUESTION, BACKCHANNEL, AGREEMENT, 
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DISAGREEMENT, and APOLOGY. Our model detects and predicts dialogue acts based on 
lexical, collocational, and prosodic cues, as well as on the discourse coherence of the 
dialogue act sequence. The dialogue model is based on treating the discourse structure of 
a conversation as a hidden ... 

Speech repairs, intonational phrases, and discourse markers: modeling speakers' 
utterances in spoken dialogue 
Peter A. Heeman, James F, Allen 

December 1999 Computational Linguistics, volume 25 issue 4 
Publisher: MIT Press 

Full text available: ^ 

■gj paf(3.Q3 MB) ^ Additional Information: full citation , abstract, references , citings 
Publisher Site 

Interactive spoken dialogue provides many new challenges for natural language 
understanding systems. One of the most critical challenges is simply determining the 
speaker's intended utterances: both segmenting a speaker's turn into utterances and 
determining the Intended words in each utterance. Even assuming perfect word 
recognition, the latter problem is complicated by the occurrence of speech repairs, which 
occur where speakers go back and change (or repeat) something they just said. The 
word ... 

Technical papers: Three-dimensional routing in underwater acoustic sensor networks ^ 
Dario Pompili, Tommaso Melodia 

October 2005 Proceedings of the 2nd ACM international workshop on Performance 
evaluation of wireless ad hoc, sensor, and ubiquitous networics PE- 
WASUN '05 

Publisher: ACM Press 

Full text available: Q pdf(379.93 KB) Additional Information: full citation, abstract , references , index terms 

Underwater sensor networks will find applications in oceanographic data collection, 
pollution monitoring, offshore exploration, disaster prevention, assisted navigation, and 
tactical surveillance applications. In this paper, the problem of data gathering in a 3D 
underwater acoustic sensor network is investigated at the network layer, by considering 
the Interactions between the routing functions and the characteristics of the underwater 
channel. Two routing algorithms are proposed for delay-insen ... 

Keywords: mathematical programming/optimization, routing algorithms, underwater 
acoustic sensor networks 



6 Acoustic modeling: Minimizing speaker variation effects for speaker-independent Q 
speech recognition 
Xuedong Huang 

February 1992 Proceedings of the workshop on Speech and Natural Language HLT '91 

Publisher: Association for Computational Linguistics 

Full text available: Q pdf(605.36 KB) Additional Information: full citation , abstract, references 

For speaker-independent speech recognition, speaker variation is one of the major error 
sources. In this paper, a speaker-independent normalization network is constructed such 
that speaker variation effects can be minimized. To achieve this goal, multiple speaker 
clusters are constructed from the speaker-independent training database. A codeword- 
dependent neural network is associated with each speaker cluster. The cluster that 
contains the largest number of speakers is designated as the golden c ... 

' H 

Technicial session 5: student best paper contest: LyricAlly: automatic synchronization 
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of acoustic musical signals and textual lyrics 

Ye Wang, Min-Yen Kan, Tin Lay Nwe, Arun Shenoy, Jun Yin 

October 2004 Proceedings of the 12th annual ACM international conference on 
Multimedia 

Publisher: ACM Press 

Full text available: "Q pdfMSS.IO KB) Additional Information: full citation , abstract, references , index terms 

We present a prototype that automatically aligns acoustic musical signals with their 
corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle 
this problem using a multimodal approach, where the appropriate pairing of audio and text 
processing helps create a more accurate system. Our audio processing technique uses a 
combination of top-down and bottom-up approaches, combining the strength of low-level 
audio features and high-level musical knowledge to determine ... 

Keywords: audio/text synergy, karaoke, lyric alignment, music knowledge, vocal 
detection 



8 Attacking passwords and bringing down the network: Keyboard acoustic emanations Q 
revisited 

Li Zhuang, Feng Zhou, J. D. Tygar 

November 2005 Proceedings of the 12th ACM conference on Computer and 

communications security CCS '05 
Publisher: ACM Press 

Full text available: ^pdfH 98.94 KB) Additional Information: full citation , abstract , references, index terms 

We examine the problem of keyboard acoustic emanations. We present a novel attack 
taking as input a 10-minute sound recording of a user typing English text using a 
keyboard, and then recovering up to 96% of typed characters. There is no need for a 
labeled training recording. Moreover the recognizer bootstrapped this way can even 
recognize random text such as passwords: In our experiments, 90% of 5-character 
random passwords using only letters can be generated in fewer than 20 attempts by an 
adve ... 

Keywords: HMM, acoustic emanations, cepstrum, computer security, electronic 
eavesdropping, hidden Markov models, human factors, keyboards, learning theory, 
privacy, signal analysis 




Acoustic environment as an indicator of social and physical context 
Dan Smith, Ling Ma, Nick Ryan 

March 2006 Personal and Ubiquitous Computing, volume lo issue 4 
Publisher: Springer-Verlag 

Full text available: '@£dfi73176KBi Additional Information: full citation, abstract 

Acoustic environments provide many valuable cues for context-aware computing 
applications. From the acoustic environment we can infer the types of activity, 
communication modes and other actors involved in the activity. Environmental or 
background noise can be classified with a high degree of accuracy using recordings from 
microphones commonly found in PDAs and other consumer devices. We describe an 
acoustic environment recognition system incorporating an adaptive learning mechanism 
and its use ... 

Keywords: Acoustic environment, Adaptive feedback, Classification, Context awareness. 
Machine learning. Mobile computing 
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10 Continuous speech recognition I: Acoustic modeling of subword units for large 
vocabulary speaker independent speech recognition 
Chin-Hui Lee, Lawrence R. Rablner, Roberto Pieraccini, Jay G. Wilpon 
October 1989 Proceedings of the workshop on Speech and Natural Language HLT '89 
Publisher: Association for Computational Linguistics 

Full text available: Qpdff904.19 KB^ Additional Infomiation: full citation , abstract , references 

The field of large vocabulary, continuous speech recognition has advanced to the point 
where there are several systems capable of attaining between 90 and 95% word accuracy 
for speaker independent recognition of a 1000 word vocabulary, spoken fluently for a task 
with a perplexity (average word branching factor) of about 60. There are several factors 
which account for the high performance achieved by these systems, including the use of 
hidden Markov models (HMM) for acoustic modeling, the use of ... 

Planning the acoustic urban environment: a GIS-centered approach 
l^aria Piedade G. Oliveira, Eduardo Bauzer Medeiros, Clodoveu A. Davis 
November 1999 Proceedings of the 7th ACM international symposium on Advances in 

geographic information systems 
Publisher: ACM Press 

Full text available: ^pdfM 14.98 KB) Additional Information: full citation, references , index temris 



Keywords: geographic applications, pollution control, urban noise 



^2 Analysis and modeling of FO contours for cantonese text-to-speech 
Yujia Li, Tan Lee, Yao Qian 

September 2004 ACM Transactions on Asian Language Information Processing 

(TALIP), Volume 3 Issue 3 
Publisher: ACM Press 

Full text available: ^ pdff969,61 KB) Additional Information: full citation , abstract, references , index ternis 

For the generation of highly natural synthetic speech, the control of prosody is of primary 
importance. The fundamental frequency (FO) is one of the most important components of 
speech prosody. This research investigates the variation of FO in continuous Cantonese 
speech, with the goal of establishing an effective mechanism of prosody control in 
Cantonese text-to-speech (TTS) applications. Cantonese is a commonly used Chinese 
dialect that is well known for being rich in tones. This article de ... 

Keywords: Chinese dialects, Text-to-speech, fundamental frequency, prosody, tones 



Modeling acoustics in virtual environments using the unifornri theory of diffraction 
Nicolas Tsingos, Thomas Funkhouser, Addy Ngan, Ingrid Carlbom 

August 2001 Proceedings of the 28th annual conference on Computer graphics and 
interactive techniques 

Publisher: ACM Press 

Full text available- ffl Ddf(6.03 MB) Additional Information: full citation, abstract, references, citings, index 
* 1^::*^'— terms 

Realistic modeling of reverberant sound in 3D virtual worlds provides users with important 
cues for localizing sound sources and understanding spatial properties of the environment. 
Unfortunately, current geometric acoustic modeling systems do not accurately simulate 
reverberant sound. Instead, they model only direct transmission and specular reflection, 
while diffraction is either ignored or modeled through statistical approximation. However, 
diffraction is important for correct interpretati ... 
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14 Acoustic modeling: Subphonetic modelina for speech recognition | 

Mei-Yuh Hwang, Xuedong Huang 

February 1992 Proceedings of the workshop on Speech and Natural Language HLT '91 
Publisher: Association for Computational Linguistics 

Full text available: Qpdf(611.33 KB) Additional Infonnation: full citation , abstract, references 

How to capture important acoustic clues and estimate essential parameters reliably is one 
of the central issues in speech recognition, since we will never have sufficient training data 
to model various acoustic-phonetic phenomena. Successful examples include subword 
models with many smoothing techniques. In comparison with subword models, 
subphonetic modeling may provide a finer level of details. We propose to model 
subphonetic events with Markov states and treat the state In phonetic hidden Mar ... 

Computer graphics visualization for acoustic simulation | 
A. Stettner, D. P. Greenberg 

July 1989 ACM SIGGRAPH Computer Graphics , Proceedings of the 16th annual 

conference on Computer graphics and interactive techniques SIGGRAPH 
'89, Volume 23 Issue 3 
Publisher: ACM Press 

Full text available: ffi Ddff14.64 MB) Additional Information: full citation, abstract, references, citings, index 

Computer simulations can be used to generate the spatial and temporal data describing 
the acoustical behavior of performance halls, but typically the analytical results are difficult 
to assimilate and compare. By using computer graphics to display the multi-dimensional 
data, substantially greater amounts of information than that conveyed by standard 
techniques can be communicated to the designer. This allows designs of different 
acoustical spaces to be tested, evaluated, and compared.An example ... 

16 Acoustic modelina and robust CSR: Hiah-accuracy larae-vocabulary speech | 
recognition using mixture tying and consistency modeling 

Vassilios Digalakis, Hy Murveit 

March 1994 Proceedings of the workshop on Human Language Technology HLT '94 
Publisher: Association for Computational Linguistics 

Full text available: ^ Ddf(587.65 KB^ Additional Infonnation: full citation , abstract, references 
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